Investigating Statistical Techniques for Sentence-Level Event Classification
نویسندگان
چکیده
The ability to correctly classify sentences that describe events is an important task for many natural language applications such as Question Answering (QA) and Summarisation. In this paper, we treat event detection as a sentence level text classification problem. We compare the performance of two approaches to this task: a Support Vector Machine (SVM) classifier and a Language Modeling (LM) approach. We also investigate a rule based method that uses hand crafted lists of terms derived from WordNet. These terms are strongly associated with a given event type, and can be used to identify sentences describing instances of that type. We use two datasets in our experiments, and evaluate each technique on six distinct event types. Our results indicate that the SVM consistently outperform the LM technique for this task. More interestingly, we discover that the manual rule based classification system is a very powerful baseline that outperforms the SVM on three of the six event types.
منابع مشابه
Survey on Clustering Algorithm for Sentence Level Text
Clustering is an extensively studied data mining problem in the text domains. The difficulty finds numerous applications in customer segmentation, classification, collaborative filtering, visualization, document organization, and indexing. In text mining, clustering the sentence is one of the processes and used within general text mining tasks. Several clustering methods and algorithms are used...
متن کاملClassification of encrypted traffic for applications based on statistical features
Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...
متن کاملA Systematic Comparison of Smoothing Techniques for Sentence-Level BLEU
BLEU is the de facto standard machine translation (MT) evaluation metric. However, because BLEU computes a geometric mean of n-gram precisions, it often correlates poorly with human judgment on the sentence-level. Therefore, several smoothing techniques have been proposed. This paper systematically compares 7 smoothing techniques for sentence-level BLEU. Three of them are first proposed in this...
متن کاملThe Role of Knowledge-based Features in Polarity Classification at Sentence Level
Though polarity classification has been extensively explored at document level, there has been little work investigating feature design at sentence level. Due to the small number of words within a sentence, polarity classification at sentence level differs substantially from document-level classification in that resulting bag-of-words feature vectors tend to be very sparse resulting in a lower ...
متن کاملData mining with cellular discrete event modeling and simulation
Data mining is the process of extracting patterns from data. A main step in this process is referred to as data classification. In this work, we investigate the use of the Cell-DEVS formalism for classifying data. The cells in a Cell-DEVS based grid are individually very simple but together they can represent complex behavior and are capable of self-organization. Three classifier models are imp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008